AITopics

2505.15511

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Orzechowski, Patryk, Moore, Jason H.

EBIC: an open source software for high-dimensional and big data biclustering analyses

arXiv.org Machine LearningSep-4-2024

Motivation: In this paper we present the latest release of EBIC, a next-generation biclustering algorithm for mining genetic data. The major contribution of this paper is adding support for big data, making it possible to efficiently run large genomic data mining analyses. Additional enhancements include integration with R and Bioconductor and an option to remove influence of missing value on the final result. Results: EBIC was applied to datasets of different sizes, including a large DNA methylation dataset with 436,444 rows. For the largest dataset we observed over 6.6 fold speedup in computation time on a cluster of 8 GPUs compared to running the method on a single GPU. This proves high scalability of the algorithm. Availability: The latest version of EBIC could be downloaded from http://github.com/EpistasisLab/ebic . Installation and usage instructions are also available online.

algorithm, dataset, ebic, (13 more...)

arXiv.org Machine Learning

doi: 10.1093/bioinformatics/btz027

1807.09932

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.15)
Europe > Poland > Lesser Poland Province > Kraków (0.05)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.52)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.96)
Information Technology > Software (0.91)
Information Technology > Data Science > Data Mining > Big Data (0.62)

arXiv.org Artificial IntelligenceApr-16-2024

LLMem: Estimating GPU Memory Usage for Fine-Tuning Pre-Trained LLMs

Kim, Taeho, Wang, Yanming, Chaturvedi, Vatshank, Gupta, Lokesh, Kim, Seyeon, Kwon, Yongin, Ha, Sangtae

Fine-tuning pre-trained large language models (LLMs) with limited hardware presents challenges due to GPU memory constraints. Various distributed fine-tuning methods have been proposed to alleviate memory constraints on GPU. However, determining the most effective method for achieving rapid fine-tuning while preventing GPU out-of-memory issues in a given environment remains unclear. To address this challenge, we introduce LLMem, a solution that estimates the GPU memory consumption when applying distributed fine-tuning methods across multiple GPUs and identifies the optimal method. We conduct GPU memory usage estimation prior to fine-tuning, leveraging the fundamental structure of transformer-based decoder models and the memory usage distribution of each method. Experimental results show that LLMem accurately estimates peak GPU memory usage on a single GPU, with error rates of up to 1.6%. Additionally, it shows an average error rate of 3.0% when applying distributed fine-tuning methods to LLMs with more than a billion parameters on multi-GPU setups.

fine-tuning, gpu memory usage, memory usage, (14 more...)

2404.10933

Country:

North America > United States > Colorado > Boulder County > Boulder (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceMar-19-2024

JORA: JAX Tensor-Parallel LoRA Library for Retrieval Augmented Fine-Tuning

Tahir, Anique, Cheng, Lu, Liu, Huan

The scaling of Large Language Models (LLMs) for retrieval-based tasks, particularly in Retrieval Augmented Generation (RAG), faces significant memory constraints, especially when fine-tuning extensive prompt sequences. Current open-source libraries support full-model inference and fine-tuning across multiple GPUs but fall short of accommodating the efficient parameter distribution required for retrieved context. Addressing this gap, we introduce a novel framework for PEFT-compatible fine-tuning of Llama-2 models, leveraging distributed training. Our framework uniquely utilizes JAX's just-in-time (JIT) compilation and tensor-sharding for efficient resource management, thereby enabling accelerated fine-tuning with reduced memory requirements. This advancement significantly improves the scalability and feasibility of fine-tuning LLMs for complex RAG applications, even on systems with limited GPU resources. Our experiments show more than 12x improvement in runtime compared to Hugging Face/DeepSpeed implementation with four GPUs while consuming less than half the VRAM per GPU.

fine-tuning, jora, library, (14 more...)

2403.11366

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Arizona > Maricopa County > Tempe (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJun-23-2023

Computron: Serving Distributed Deep Learning Models with Model Parallel Swapping

Zou, Daniel, Jin, Xinchen, Yu, Xueyang, Zhang, Hao, Demmel, James

Many of the most performant deep learning models today in fields like language and image understanding are fine-tuned models that contain billions of parameters. In anticipation of workloads that involve serving many of such large models to handle different tasks, we develop Computron, a system that uses memory swapping to serve multiple distributed models on a shared GPU cluster. Computron implements a model parallel swapping design that takes advantage of the aggregate CPU-GPU link bandwidth of a cluster to speed up model parameter transfers. This design makes swapping large models feasible and can improve resource utilization. We demonstrate that Computron successfully parallelizes model swapping on multiple GPUs, and we test it on randomized workloads to show how it can tolerate real world variability factors like burstiness and skewed request rates. Computron's source code is available at https://github.com/dlzou/computron.

artificial intelligence, latency, machine learning, (16 more...)

2306.13835

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceNov-16-2022, 07:01:30 GMT

Understanding Memory Requirements for Deep Learning and Machine Learning

Building a machine learning workstation can be difficult, not to mention choosing the right workstation with the proper machine learning memory requirements. There are a lot of moving parts based on the types of projects you plan to run. Understanding machine learning memory requirements is a critical part of the building process. Sometimes, though, it is easy to overlook. The average memory requirement is 16GB of RAM, but some applications require more memory.

learning, machine learning, memory requirement, (12 more...)

Technology:

Information Technology > Hardware > Memory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

#artificialintelligenceOct-11-2022, 16:12:35 GMT

9 libraries for parallel & distributed training/inference of deep learning models

In this blog we will cover a few basics of large model training before jumping to the list of libraries available. To skip the basics of large model training and jump to the list of libraries click here. Large deep learning models require significant amount of memory to train. Models require memory to store intermediate activations, weights etc.. while training. Some models can be trained only with a very small batch size on a single GPU while other models may not fit on single GPU.

deep learning model, library, training inference, (12 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceSep-7-2022, 02:02:50 GMT

Fundamentals of Deep Learning for Multi-GPUs (Day 2)

Note: By registering for Day 1 you will automatically be registered for Day 2. You cannot register for Day 2. This page is a placeholder. This workshop teaches you techniques for training deep neural networks on multi-GPU technology to shorten the training time required for data-intensive applications. Working with deep learning tools, frameworks, and workflows to perform neural network training, you'll learn concepts for implementing PyTorch multi-GPUs to reduce the complexity of writing efficient distributed software and to maintain accuracy when training a model across many GPUs. Workshop format: Interactive presentation with hands-on exercises Target audience: This workshop is intended for researchers that would like to use multiple GPUs to train deep learning models in PyTorch. Knowledge prerequisites: Participants should be comfortable with training deep learning models using a single GPU.

deep learning model, multi-gpus, training deep learning model, (4 more...)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceAug-8-2022

A Frequency-aware Software Cache for Large Recommendation System Embeddings

Fang, Jiarui, Zhang, Geng, Han, Jiatong, Li, Shenggui, Bian, Zhengda, Li, Yongbin, Liu, Jin, You, Yang

Deep learning recommendation models (DLRMs) have been widely applied in Internet companies. The embedding tables of DLRMs are too large to fit on GPU memory entirely. We propose a GPU-based software cache approaches to dynamically manage the embedding table in the CPU and GPU memory space by leveraging the id's frequency statistics of the target dataset. Our proposed software cache is efficient in training entire DLRMs on GPU in a synchronized update manner. It is also scaled to multiple GPUs in combination with the widely used hybrid parallel training approaches. Evaluating our prototype system shows that we can keep only 1.5% of the embedding parameters in the GPU to obtain a decent end-to-end training speed.

dlrm, gpu, idx, (14 more...)

2208.05321

Country:

Asia > China > Hubei Province > Wuhan (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.40)

Industry: Information Technology (0.48)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.65)

#artificialintelligenceMar-6-2022, 10:55:08 GMT

GitHub - royorel/StyleSDF

Training files will be released soon. StyleSDF is trained only on single-view RGB data. The 3D geometry is learned implicitly with an SDF-based volume renderer. We introduce a high resolution, 3D-consistent image and shape generation technique which we call StyleSDF. Our method is trained on single-view RGB data only, and stands on the shoulders of StyleGAN2 for image generation, while solving two main challenges in 3D-aware GANs: 1) high-resolution, view-consistent generation of the RGB images, and 2) detailed 3D shape.

renderer, royorel stylesdf, stylesdf, (14 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.57)
Information Technology > Artificial Intelligence > Vision (0.37)